1.4. Understanding Geospatial Data#
Geospatial data, a key component of spatial analysis and geographic sciences, refers to information that is linked directly or indirectly to a specific location or geographical area. This section delves into the characteristics of geospatial data, its various forms, and its significance in spatial analysis.
The spatial dimension is a critical aspect of data analysis, providing a geographical perspective that transforms raw data into insightful visualizations. It enables the detection of patterns, such as migration flows, urban development, and environmental shifts. Spatial analysis also reveals relationships between data points, highlighting proximity and clustering effects. This analysis is crucial for resource allocation, urban planning, and emergency management, offering clear insights for informed decision-making. Furthermore, visualizing spatial data supports strategic policy development and effective governance, ensuring that decisions are grounded in tangible, spatially-referenced evidence.
1.4.1. Types of Geospatial Data#
In the context of a Geographic Information System (GIS), real-world observations—which include any objects or events that are measurable in two or three dimensions—must be translated into simplified spatial representations. This process involves distilling complex, real-world details into fundamental spatial constructs that can be effectively managed and analyzed within a GIS framework. These constructs are then modeled in one of two ways:
Vector Data Model: This approach captures the geometry and location of spatial entities using points, lines, and polygons. It is adept at representing discrete features with clear boundaries and precise locations, such as buildings, roads, or administrative borders.
Fig. 1.10 An example of vector data.#
Raster Data Model: In this model, the spatial entities are depicted as a uniform grid of cells, with each cell holding a value to represent a particular attribute of that area, such as elevation or temperature. It is suited for continuous data that doesn’t have distinct boundaries, like rainfall distribution or land surface temperatures.
Fig. 1.11 An example of raster data.#
Both are fundamental to GIS and are chosen based on the nature of the data and the specific requirements of the analysis being performed.
1.4.1.1. Vector Data#
Vector data is a way of representing real-world features within the context of spatial analysis and geographic information systems (GIS). Here’s a breakdown of its components:
Points: The most basic form of vector data, points are used to represent discrete locations on the earth’s surface. Each point is defined by a pair of coordinates (latitude and longitude) and can symbolize locations like cities, wells, or trees.
Example: The dataset from the Calgary Public Library contains information about library locations and their hours of operation. Here we only represent libraries and their locations.
Show code cell source
import geopandas as gpd
import folium
from shapely.geometry import Point
import numpy as np
# Read the data file
gdf = gpd.read_file('../data/Calgary_Public_Library_Locations_and_Hours_20240620.csv')
# Function to create Point objects from coordinates
def create_point(loc):
lon, lat = map(float, loc.strip('()').split(', '))
return Point(lat, lon)
# Convert the 'Location' column to Point objects with correct coordinates
gdf['Location'] = gdf['Location'].apply(create_point)
# Extract coordinates into NumPy array
coords = np.array([(point.x, point.y) for point in gdf['Location']])
# Calculate the median of all points
median_coords = np.median(coords, axis=0)
# Get the median coordinates
median_lat, median_lon = median_coords
# Create a map centered around the median point
m = folium.Map(location=[median_lon, median_lat], zoom_start=10, tiles="cartodbpositron",
control_scale = True)
# Add markers to the map
icon_size = (8, 8) # Define the icon size (width, height)
for _, row in gdf.iterrows():
folium.Marker(
location=[row['Location'].y, row['Location'].x],
popup=row['Library'],
icon=folium.CustomIcon(
icon_image='../data/simple_icon.png', # Replace with the path to your icon image
icon_size=icon_size,
icon_anchor=(0, 0), # Adjust anchor to the center of the icon
popup_anchor=(0, 0) # Adjust popup anchor to appear above the icon
)
).add_to(m)
# Display the map
display(m)
Note - Map Scale
The scale in a GIS context is the ratio of a distance on the map to the actual distance on the ground. A large-scale map shows a larger ratio, meaning that map features are relatively large. This type of map covers a smaller area but with greater detail. For instance, a scale might be represented as 1:5,000 where 1 unit on the map equals 5,000 units in reality.
On these Folium maps, the scale is indicated in both kilometers and miles for convenience, such as 5 km or 5 mi, aiding in quick estimation of distances.
Lines: Lines, or polylines, are sequences of points connected by straight segments that represent linear features such as rivers, roads, or utility lines. They are crucial for mapping routes and connections between different points.
Example: The following map displays the LRT tracks for the city of Calgary.
Show code cell source
import geopandas as gpd
import folium
import pandas as pd
from shapely import wkt
# Read the CSV file into a DataFrame
df = pd.read_csv('../data/Tracks_-_LRT_20240622.csv')
# Convert the 'the_geom' column to LineString geometries
df['geometry'] = df['the_geom'].apply(wkt.loads)
# Filter the DataFrame to include only rows where RAIL_TYPE is 'LRT'
df_lrt = df[df['RAIL_TYPE'] == 'LRT']
# Create a GeoDataFrame
gdf_lrt = gpd.GeoDataFrame(df_lrt, geometry='geometry')
# Calculate the centroid of all geometries for the initial map center
avg_lat = gdf_lrt.geometry.centroid.y.mean()
avg_lon = gdf_lrt.geometry.centroid.x.mean()
# Create a folium map centered around the average coordinates
m = folium.Map(location=[avg_lat, avg_lon], zoom_start=10, tiles="cartodbpositron", control_scale=True)
# Add LineString geometries to the map with color based on 'LRT'
for _, row in gdf_lrt.iterrows():
folium.PolyLine(
locations=[(coord[1], coord[0]) for coord in row['geometry'].coords],
color='green', # Set the color for 'LRT'
weight=2
).add_to(m)
# Display the map
display(m)
Polygons: Polygons are closed shapes formed by connecting multiple line segments end-to-end. They are used to represent areas like lakes, park boundaries, or property lots. Polygons can be complex, with attributes like area, perimeter, and centroid.
Example: The following data is a representation of the City of Calgary’s boundary in a MULTIPOLYGON format.
Show code cell source
import folium
import geopandas as gpd
from IPython.display import display
# Load the GeoJSON file
gdf = gpd.read_file('../data/City Boundary_20240620.gpkg')
# Get the center of the map
x, y = gdf.geometry.unary_union.centroid.x, gdf.geometry.unary_union.centroid.y
# Create a folium map centered around the centroid of the GeoJSON
m = folium.Map(location=[y, x], zoom_start= 9, tiles="cartodbpositron",
control_scale = True)
# Add the GeoJSON to the map
folium.GeoJson(gdf).add_to(m)
# Display the map
display(m)
Vector data is particularly valuable in applications that require high precision and detail. For example:
Cadastral Mapping: This involves creating maps that show property boundaries and land ownership. Precision is key here, as legal implications are involved.
Navigation Systems: GPS and other navigation tools use vector data to provide accurate turn-by-turn directions and route planning.
1.4.1.2. Raster Data#
Raster data is a type of geospatial data representation that uses a matrix of cells, commonly referred to as pixels, to model the Earth’s surface and various phenomena. This method is particularly effective for capturing and conveying information that changes continuously over space, such as elevation, temperature, or land cover.
Grid of Pixels: Imagine a raster as a digital canvas where each pixel is a square paint dab. Each dab (pixel) carries specific information about that tiny square of the real world.
Pixel Values: The value of each pixel can represent different types of data. For example, in a temperature map, the pixel value might indicate the temperature at that location; in a digital elevation model, it would represent the height above sea level.
Example: Imagine a 2D array filled with random integers ranging from 0 to 255. By applying a colormap to a 2D plot of this array, we can create a visual representation. This method is akin to how we visualize diverse datasets, including elevation and land surface temperatures, to extract meaningful patterns from numerical values.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
# Set the random seed for reproducibility
np.random.seed(0)
# Generate a random 10x10 array with values between 0 and 255
X = np.random.randint(256, size=(10, 10))
# Create the figure and axis objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Display the array as an image with the 'Spectral' colormap
im = ax.imshow(X, cmap='Spectral')
# Set the aspect ratio of the axis to be equal
ax.set_aspect('equal')
# Add a colorbar to the figure with specified fraction and padding
cbar = fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('Color Intensity', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Add a title to the plot
ax.set_title("Plot 2D Array with Spectral Colormap", fontsize=15)
# Disable the grid lines
ax.grid(False)
# Adjust layout to ensure everything fits without overlap
plt.tight_layout()
Some common uses of raster data include:
Satellite Imagery: These images are composed of raster data where each pixel corresponds to a specific area on the Earth’s surface, capturing details like land cover, urbanization, or agricultural fields.
Elevation Models: Digital Elevation Models (DEMs) use raster data to represent the terrain. Each pixel’s value indicates the elevation at that specific point, which is essential for flood modeling, land use planning, and even 3D visualization.
Example: The goal of the code example aligns with the principles discussed in the previous example about the spatial dimension. Just as we use spatial analysis to visualize and understand complex datasets, the code demonstrates this process in action. It uses Earth Engine and geemap to create visual representations of elevation and water occurrence, similar to how we might visualize land surface temperatures or other environmental data. The example underscores the power of geospatial tools to transform numerical data into comprehensible, visual formats, aiding in the analysis and decision-making processes that were highlighted earlier. Essentially, it’s a practical application of the spatial dimension’s capabilities in real-world scenarios.
Show code cell source
import ee
import geemap
import geemap.colormaps as cm
# Authenticate and initialize Earth Engine
ee.Authenticate()
ee.Initialize()
# Create a map centered on Calgary
Map = geemap.Map(center=[51.0447, -114.0719], zoom=10)
# Set the basemap to 'Esri National Geographic'
# Map.add_basemap('USGS 3DEP Elevation')
# Add an elevation layer
dem = ee.Image('CGIAR/SRTM90_V4')
elevation = dem.select('elevation')
vis_params = {'min': 0, 'max': 4000, 'palette': cm.palettes.dem}
Map.addLayer(elevation, vis_params, 'SRTM DEM (Version 4)')
Map.add_colorbar(vis_params, label="Elevation (m)", layer_name="SRTM DEM (Version 4)")
# Add a water layer to visualize streams
water = ee.Image('JRC/GSW1_3/GlobalSurfaceWater')
occurrence = water.select('occurrence')
Map.addLayer(occurrence.updateMask(occurrence.gt(0)), {'palette': "blue"},
'JRC Global Surface Water (v1.4)')
# Display the map
display(Map)
1.4.2. Attribute Tables#
In Geographic Information Systems (GIS), attribute tables are essential components that store non-spatial data linked to spatial features. Each spatial feature on a map, such as a building, road, or land parcel, corresponds to a record in the attribute table. This record is connected to the feature through a unique numerical identifier known as a Feature Identifier (FID). For example, a park (spatial feature) on a GIS map may have an FID of 102, and its corresponding record in the attribute table could include attributes like area, vegetation type, and usage regulations.
Example: Let’s take a look at the attribute tables for the dataset from the Calgary Community Boundaries. The example shows a snippet of an attribute table for the Calgary Community Boundaries dataset. It illustrates how each spatial feature, like a park or residential area, is associated with a record in the table, identified by a unique Feature Identifier (FID). The table includes various attributes such as class, class code, community code, name, sector, and more, which describe the non-spatial characteristics of the spatial features. The purpose is to show how GIS integrates spatial data (like MULTIPOLYGON geometries) with descriptive information, enabling detailed analysis and decision-making. It highlights the importance of attribute tables in managing and utilizing geospatial data effectively.
Show code cell source
import geopandas as gpd
import folium
from shapely import wkt
from shapely.geometry import MultiPolygon
# Read the data file
gdf = gpd.read_file('../data/Community_District_Boundaries_20240620.csv')
# Ensure the coordinate reference system is set to 'epsg:4326'
gdf.crs = 'epsg:4326'
# Convert 'MULTIPOLYGON' column to geometry
gdf['geometry'] = gdf['MULTIPOLYGON'].apply(wkt.loads)
display(gdf)
| CLASS | CLASS_CODE | COMM_CODE | NAME | SECTOR | SRG | COMM_STRUCTURE | CREATED_DT | MODIFIED_DT | MULTIPOLYGON | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Residential | 1 | LEB | LEWISBURG | NORTH | DEVELOPING | BUILDING OUT | 2016/12/21 | 2019/11/26 | MULTIPOLYGON (((-114.0480237 51.1749865, -114.... | MULTIPOLYGON (((-114.04802 51.17499, -114.0471... |
| 1 | Residential | 1 | CSC | CITYSCAPE | NORTHEAST | DEVELOPING | BUILDING OUT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9524996 51.1543075, -113.... | MULTIPOLYGON (((-113.95250 51.15431, -113.9700... |
| 2 | Industrial | 2 | ST1 | STONEY 1 | NORTH | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-114.0133015 51.1744266, -114.... | MULTIPOLYGON (((-114.01330 51.17443, -114.0147... |
| 3 | Residential | 1 | MRT | MARTINDALE | NORTHEAST | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2020/10/22 | MULTIPOLYGON (((-113.9648991 51.1251901, -113.... | MULTIPOLYGON (((-113.96490 51.12519, -113.9684... |
| 4 | Industrial | 2 | ST2 | STONEY 2 | NORTHEAST | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9939281 51.153327, -113.9... | MULTIPOLYGON (((-113.99393 51.15333, -113.9939... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 308 | Residential | 1 | DRN | DEER RUN | SOUTH | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2024/04/15 | MULTIPOLYGON (((-114.0118593 50.9381207, -114.... | MULTIPOLYGON (((-114.01186 50.93812, -114.0118... |
| 309 | Major Park | 3 | FPK | FISH CREEK PARK | PARKS | 2024/04/02 | 2024/04/15 | MULTIPOLYGON (((-114.1109815 50.9214266, -114.... | MULTIPOLYGON (((-114.11098 50.92143, -114.1109... | ||
| 310 | Residual Sub Area | 4 | 02L | 02L | OTHER | 2016/12/21 | 2024/05/13 | MULTIPOLYGON (((-114.0945798 51.2123357, -114.... | MULTIPOLYGON (((-114.09458 51.21234, -114.0947... | ||
| 311 | Residential | 1 | ABR | AMBLERIDGE | NORTH | DEVELOPING | BUILDING OUT | 2024/05/13 | 2024/05/13 | MULTIPOLYGON (((-114.1295323 51.1977901, -114.... | MULTIPOLYGON (((-114.12953 51.19779, -114.1413... |
| 312 | Residential | 1 | GLR | GLACIER RIDGE | NORTH | DEVELOPING | BUILDING OUT | 2020/06/01 | 2024/05/13 | MULTIPOLYGON (((-114.1679438 51.196922, -114.1... | MULTIPOLYGON (((-114.16794 51.19692, -114.1679... |
313 rows × 11 columns
Show code cell source
# Recreate the GeoDataFrame with the new geometry column
gdf = gpd.GeoDataFrame(gdf, geometry='geometry', crs='epsg:4326')
# Define a color map for different classes
color_map = {
'Residential': 'red',
'Industrial': 'blue',
'Major Park': 'green',
'Residual Sub Area': 'purple',
# Add more classes and colors as needed
}
# Initialize the folium map centered around the mean coordinates of the geometries
m = folium.Map(location=[gdf.geometry.centroid.y.mean(), gdf.geometry.centroid.x.mean()],
zoom_start=9, tiles="cartodbpositron", control_scale=True)
# Add each geometry to the map with a different color based on its class
for _, row in gdf.iterrows():
folium.GeoJson(
row['geometry'],
style_function=lambda feature, color=color_map.get(row['CLASS'], 'black'): {
'fillColor': color,
'color': color,
'weight': 2,
'fillOpacity': 0.6
}
).add_to(m)
# Create a legend HTML
legend_html = '''
<div style="position: fixed;
bottom: 50px; left: 50px; width: 150px; height: 150px;
border:2px solid grey; z-index:9999; font-size:14px;
background-color:white;
">
<b> Legend </b><br>
<i class="fa fa-square" style="color:red"></i> Residential <br>
<i class="fa fa-square" style="color:blue"></i> Industrial <br>
<i class="fa fa-square" style="color:green"></i> Major Park <br>
<i class="fa fa-square" style="color:purple"></i> Residual Sub Area <br>
<!-- Add more classes here if needed -->
</div>
'''
# Add the legend to the map
m.get_root().html.add_child(folium.Element(legend_html))
display(m)
1.4.3. Raster Data Attributes#
Raster data play a crucial role in Geographic Information Systems (GIS), where they serve as a fundamental means of representing spatial information through the values assigned to each pixel. These pixels are not just mere placeholders of spatial data; they can be categorized using unique integer values, which allows them to be linked to a set of attributes. This categorization is especially significant in land cover datasets, where different environmental features such as water bodies, forests, and urban areas are denoted by these pixel values. Each category is meticulously described in an attribute table, which includes detailed characteristics like the quality of water, the density of forests, or the regulations governing urban zones.
Example: Visualizing Raster Data with Heatmaps: Take, for example, the process of visualizing raster data with heatmaps. This method is particularly illustrative of how raster data are handled within GIS. A Python script can generate a random array that simulates a raster dataset, with values ranging from 0 to 255, representing varying intensities of a particular attribute. In this case, the attribute is color intensity, and the heatmap created using the Seaborn library vividly displays these intensities. The ‘Spectral’ colormap is employed to provide a visual spectrum that distinguishes between the different intensities, much like how an attribute table would categorize and describe various land cover types.
Show code cell source
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Set the random seed for reproducibility
np.random.seed(0)
# Generate a random 10x10 array with values between 0 and 255
X = np.random.randint(256, size=(10, 10))
# Create the figure and axes objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Create a heatmap using seaborn with annotated values and a specified color map
sns.heatmap(X, annot=True, fmt="d", cmap='Spectral', cbar_kws={'label': 'Color Intensity', 'fraction': 0.046}, ax=ax)
# Customize the colorbar
cbar = ax.collections[0].colorbar
cbar.set_label('Color Intensity', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Add a title to the heatmap
ax.set_title("Heatmap with Annotated Pixels", fontsize=15)
# Disable grid lines
ax.grid(False)
# Ensure the heatmap cells are square and adjust the layout
ax.set_aspect('equal')
plt.tight_layout()
While the heatmap offers a clear visualization of the data, assigning discrete values to each pixel that could represent different land cover categories, it’s important to remember that not all raster data formats are compatible with attribute tables. In many GIS applications, raster data are utilized without the accompaniment of attribute tables, relying solely on the inherent pixel values to convey the necessary spatial information. This approach underscores the versatility and adaptability of raster data in various GIS applications, despite the potential limitations posed by the absence of attribute tables in certain data formats.
Example: We can use geemap to visualize the MODIS Land Cover Type Product (MCD12Q1) data using Python. This example will create a map and add a layer to visualize the land cover data with a predefined color palette that corresponds to the International Geosphere-Biosphere Programme (IGBP) land cover classification.
Show code cell source
import geemap
import ee
# Initialize the Earth Engine module.
ee.Initialize()
# Create an interactive map.
Map = geemap.Map(center=[51.0447, -114.0719], zoom= 5)
# Set the visualization parameters.
igbpLandCoverVis = {
"min": 1.0,
"max": 17.0,
"palette": [
"05450a", "086a10", "54a708", "78d203", "009900", "c6b044",
"dcd159", "dade48", "fbff13", "b6ff05", "27ff87", "c24f44",
"a5a5a5", "ff6d4c", "69fff8", "f9ffa4", "1c0dff",
],
}
# Load the MODIS land cover data.
landcover = ee.Image("MODIS/006/MCD12Q1/2013_01_01").select("LC_Type1")
# Add the land cover layer to the map with the visualization parameters.
Map.addLayer(landcover, igbpLandCoverVis, "MODIS Land Cover")
# Add a legend to the map for the IGBP land cover classification.
Map.add_legend(builtin_legend="MODIS/006/MCD12Q1")
# Display the map.
# Display the map
display(Map)
1.4.4. Measurement Levels#
Attributes in GIS are categorized into four measurement levels, each with distinct characteristics:
Nominal Data: These are categorical data without any numeric significance or order. For example, land use types such as residential, commercial, and industrial are nominal data.
Ordinal Data: This data type has a ranked order but no fixed interval between ranks. A soil erosion risk map might classify areas as low, moderate, or high risk, which are ordinal data.
Interval Data: Numeric data with equal intervals but no true zero point. Temperature scales like Celsius and Fahrenheit are interval data because the difference between degrees is the same, but there is no absolute zero.
Ratio Data: Similar to interval data but with a meaningful zero point, allowing for the comparison of relative magnitudes. Examples include population counts and annual rainfall measurements, where zero represents none or no occurrence.
1.4.5. Importance of Geospatial Data#
Geospatial data finds application in a wide range of fields:
Urban Planning: City planners use geospatial data to design efficient transportation systems, plan zoning, and manage public utilities.
Environmental Conservation: Conservationists employ geospatial data to monitor habitats, track wildlife, and evaluate the impacts of climate change.
Public Safety: Emergency responders utilize geospatial data for optimizing routes, managing disasters, and allocating resources.
Business Intelligence: Companies harness geospatial data for market analysis, logistics, and strategic planning.
1.4.6. Challenges with Geospatial Data#
Despite its numerous advantages, geospatial data also poses certain challenges:
Data Quality: Ensuring the accuracy and timeliness of geospatial data is crucial, as inaccuracies can have significant real-world implications.
Data Integration: Merging geospatial data from diverse sources and standards necessitates robust integration techniques.
Privacy Concerns: The collection and usage of geospatial data must respect the privacy rights of individuals.
In the subsequent sections, we will explore these aspects of geospatial data in greater detail, equipping you with the knowledge to effectively gather, process, and analyze spatial information.